Multi-lingual opinion mining on YouTube
نویسندگان
چکیده
In order to successfully apply opinion mining (OM) to the large amounts of usergenerated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on Opinion Mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.
منابع مشابه
A Multi-lingual Annotated Dataset for Aspect-Oriented Opinion Mining
We present the Trip-MAML dataset, a Multi-Lingual dataset of hotel reviews that have been manually annotated at the sentence-level with Multi-Aspect sentiment labels. This dataset has been built as an extension of an existent English-only dataset, adding documents written in Italian and Spanish. We detail the dataset construction process, covering the data gathering, selection, and annotation. ...
متن کاملOpinion Mining on YouTube
This paper defines a systematic approach to Opinion Mining (OM) on YouTube comments by (i) modeling classifiers for predicting the opinion polarity and the type of comment and (ii) proposing robust shallow syntactic structures for improving model adaptability. We rely on the tree kernel technology to automatically extract and learn features with better generalization power than bag-of-words. An...
متن کاملAligning Opinions: Cross-Lingual Opinion Mining with Dependencies
We propose a cross-lingual framework for fine-grained opinion mining using bitext projection. The only requirements are a running system in a source language and word-aligned parallel data. Our method projects opinion frames from the source to the target language, and then trains a system on the target language using the automatic annotations. Key to our approach is a novel dependency-based mod...
متن کاملSocial Media Analytics for YouTube Comments: Issues, Gender and Sentiment
The need to elicit public opinion about predefined topics is widespread in the social sciences, government and business. Traditional survey-based methods are therefore being partly replaced by social media data mining but YouTube comments tend to be overlooked, despite the ongoing popularity of the site. This article introduces a systematic social media analytics strategy to gain insights about...
متن کاملEnhancing Web intelligence with the content of online video fragments
This demo will show work to enhance a Web intelligence platform which crawls and analyses online news and social media content about climate change topics to uncover sentiment and opinions around those topics over time to also incorporate the content within non-textual media, in our case YouTube videos. YouTube contains a lot of organisational and individual opinion about climate change which c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 52 شماره
صفحات -
تاریخ انتشار 2016